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ABSTRACT 

In this paper an extension of the sparse decomposition 
problem is considered and an algorithm for solving it is pre- 
sented. In this extension, it is known that one of the shifted 
versions of a signal s (not necessarily the original signal itself) 
has a sparse representation on an overcomplete dictionary, 
and we are looking for the sparsest representation among the 
representations of all the shifted versions of s. Then, the pro- 
posed algorithm finds simultaneously the amount of the re- 
quired shift, and the sparse representation. Experimental re- 
sults emphasize on the performance of our algorithm. 

Index Terms — atomic decomposition, sparse decompo- 
sition, sparse representation, overcomplete signal representa- 
tion, sparse source separation 

1. INTRODUCTION 

In the classical atomic decomposition problem |[T1, we have 
a signal s(t) whose samples are collected in the n x 1 signal 
vector s = [s(l), . . . , s{n)Y' and we would like to represent 
it as a linear combination of m, n x 1 signal vectors 
After |2|, the vectors c^j, 1 < i < m are called atoms and 
they collectively form a dictionary over which the signal is to 
be decomposed. We may write 

m 

s ^ ai^i = * a, (1) 

i=l 

where 4> ^ [if^, . . . , y>,^] is the n x m dictionary (matrix) 
and a = (ai, . . . , am)'^ is the m x 1 vector of coefficients. 
A dictionary with to > n is called overcomplete. Although, 
TO = n is sufficient to obtain such a decomposition (like what 
is done in Discrete Fourier Transform), using overcomplete 
dictionaries has a lot of advantages in many diverse appli- 
cations (refer for example to |3| and the references in it). 
Note that for the overcomplete case, the representation is not 
unique, but all these applications need a sparse representation, 
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that is, the signal s should be represented as a linear combina- 
tion of as small as possible number of atoms of the dictionary. 
It has been shown lH |5l that with some mild conditions on 
the dictionary matrix, if there is a sparse representation with 
at most n/2 non-zero coefficients, then this representation is 
unique. The main approaches for finding this sparse solution 
include Matching Pursuit (MP) ^ |6l, FOCUSS 0, Basis 
Pursuit (BP) [T], and Smoothed f (SLO) [71. 

In this paper, we introduce a generalization of this classi- 
cal problem to the case that we call 'convolutive sparse rep- 
resentation' . In this case, it is known that the signal s has a 
sparse representation not over the dictionary itself, but over 
some (unknown) shifted versions of the atoms. To state the 
problem more clearly, consider a representation of the form: 

m 

s = ^a,c^/'='), (2) 
1=1 

where (Pi*^*^'^ stands for the fcj-sample (circularly) shifted ver- 
sion of ^p^. Then, our problem is to find the sparsest repre- 
sentation in the form of (|2]i among all the possible values for 

k\j . . . J km- 

Note also that the Fourier transform does not convert this 
problem to the classical sparse representation ([T]) in the fre- 
quency domain: The problem in the transformed domain will 
be similar to ([T]), but with time varying ai 's. 

In this paper, we address only a special case of the general 
problem (|2]), that is, where all the shifts kt are equal. This 
is equivalent to this simplified problem: an unknown shifted 
version of s has a sparse representation over the dictionary, 
and we would like to find this representation. 

One of the trivial applications of the general problem is 
to reduce the size of the dictionary in atomic decomposition 
applications. An example for the applications of the above 
simplified problem is where our recorded signal, which has to 
be decomposed as a combination of a small number of atoms 
of the dictionary, is shifted relative to its underlying atoms 
that already exist in the dictionary. 

The paper organized as follows. In Section |2] the main 
idea of the algorithm is introduced. The resulting algorithm 



is then stated in Section |3] Finally, simulation results of the 
algorithm are presented in Section|4] 

2. MAIN IDEA 

Consider a dictionary with atoms <^j^ , i ' ■ ■ i ^m- The prob- 
lem is then to sparsely decompose an n x 1 vector s as a linear 
combination of shifted atoms of the dictionary (in this paper, 
the shifts are assumed to be circular). One trivial solution 
to the problem is to insert all shifted atoms in the dictionary 
and then find the sparsest representation of the vector s for 
that dictionary using the conventional atomic decomposition 
methods. However, this direct solution demands a high com- 
putational and storage load. 

Let also that ki be a continuous variable (a non-integer 
shift ki can be imagined as shifting the hull of signal and then 
re-sampling it). For handling circular shifts more easily, we 
take the Discrete Fourier Transform (DFT) of both sides of 
^ to obtain: 

m 

s(^)=^a,W,<^f) (3) 

in which s'-^^ and >~pf^^ are the DFTs of the signals s and 
repectively, and 
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As stated in the introduction, in this paper we consider 
only the case in which wi = W2 = ■ ■ ■ — Wm, and we present 
an iterative algorithm to solve the problem in this case. This 
case is equivalent to assuming that the atoms of the dictionary 
are fixed and the signal s is shifted in opposite direction. In 
this case: 

Wi = W2 = ■ • • = w,„ = w, 

and hence from Q we have: 

m 

s(^)=W^a,¥'P (4) 

i=l 

or: 

m 

W's(^)=^a,;vf' = *(^)a (5) 

where: 
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in which w' = e^^'^. Now the problem is to find the sparsest 
solution of (|5]l. To do so, we should have some criterion F{cx) 
for sparseness of the solution vector a and optimize that crite- 
rion subject to the constraint ^ using optimization methods. 
Note also that one of the unknows, k does not exist in the 
objective function F{a.), and appears only in the constraint 
(|5]i. As their objective functions, two classical sparse decom- 
position approaches use £^-norm 1 1 1, and smoothed l'^ (SLO) 
norm Q. Here we use the second one, because it results in 
a very fast and accurate algorithm in classical atomic decom- 
position |7|, and also because it is a differentiable measure of 
the sparsity of ex. Smoothed £°-norm of a vector is an approx- 
imation to its £°-norm (number of its non-zero element), and 
is defined as: 

m 

= m-^e-"-'/2a= (6) 

i=l 

where cr is a parameter which specifies a tradeoff between 
smoothness and the accuracy of approximation: the smaller 
(7, the better approximation of the i'^ norm; the larger a, the 
smoother objective function. 

On the other hand, (|5]l can be written as: 

G{a, w') = II W's(^' - * ^^^af = (7) 

Now we should minimize ^ subject to the (|7]), for a small 
value of (7. Note that one of the optimization variables {w'), 
is not present in F{cy.), and appears only in (|7|. 

Note that for small values of a, F contains a lot of local 
minima. Consequently, it is very difficult to directly minimize 
this function for very small values of a. The idea of Q for 
escaping from local minima is then to decrease the value of cr 
gradually: for each value of a the minimization algorithm is 
initiated with the minimizer of the F for the previous (larger) 
value of a. This idea of minimizing a non-convex function 
is called Graduated Non-Convexity |8|, and is also used in 
simulated annealing methods. 

To start the minimization, we should find a proper initial 
guess for the solution cxq, that is, the initial estimation of the 
sparsest solution of 4> o; = s. It has been shown that for 
the case of the simple sparse decomposition, the best initial 
value for a is the minimum ^^-norm solution of $ ct = s, 
that is, ao = ^ $ Q. The reason is that this 

solution minimizes the function F{a.) subject to $ a = s 
where a goes to infinity. Despite the fact that our method is 
somehow different with the method presented in Q, we use 
the same initialization for our algorithm. Since we also have 
the variable w', we should start from the sparsest a.{k) = 
$ $ "^)^^s('') vector. Let fco = argmin^. F{a{k)), for 
k — 1,2, ... ,n. Then we choose a.{ko) to be the starting 
point of our algorithm. 

Because of noise, if our algorithm tries to satisfy (|7| ex- 
actly, it would be very sensitive to noise. Consequently, we 
try to satisfy this equation approximately. We realize this idea 



• Initialization: 

1. Let: F{a) = m - EIli e""'"''""' and 
a(fe) = *^(**^)-is('='. 

2. Find the minimum of F{oc{k)) for k — 
1, 2, ■ ■ • , n. Assuming this minimum occurs 
for k — ko, let ao = Q:(fco) and ~ ■ 

3. Choose a suitable decreasing sequence for 
a — [ai . . . an]. Choose a small value for p. 

4. Let M = diag(0, Ij, ...,{n- 

• For r — 1, . . . , R: 

1. Let a — (Jr. 

2. Minimize (approximately) the function 
H(oL,6) using i iterations of the steepest 
descent algorithm: 

- Initialization: a = ctr-i and 9 = 61, — i. 

- for Z = 1 . . . L (loop L times): 

(a) Calculate |g and |f from ^ and 
dipt , respectively. 

(b) If - - < 
H{a., 6) let p = 1.2 else p = 0.5. 

(c) Let a ^ a — /iSJ/ / da 
and e ^ e - ^idH/de. 

(d) Let n ^ n X p. 

3. Set a,. = a and S,. = 

• Let a = a_R and 8 = Or. The final coefficient 
vector is a and the final shift value is n x 4-. 

ZTT 



by minimizing the function H defined below with respect to 
a and w': 

H{a.,w') = XG{a,w') + {l-X)F{a) (8) 

where < A < 1 is a constant that specifies the weight which 
is given to satisfying (|7]). This equation can be interpreted 
as a trade-off between the accuracy of the decomposition and 
maximizing the sparsity. 

By letting w' — e^^, the final objective function H{a, 6) 
will be a real-valued function of real-valued variables a. and 
9. For each a, this function may be minimized by gradient 
based algorithms (specifically steepest descent). Direct cal- 
culations show: 

^ = 2A5ft{* (^'^(* (^)a - W's(^))}+ 

OOL 

(1 - A)(l/a2)[aie(-"?/2-^), . . . , a^e'^-'^^J^^'r 

(9) 

— = -25R{s(^)MW'*} (10) 

Ou 

where M = diag(0, Ij/, . . . , (n — 

3. THE FINAL ALGORITHM 

The final algorithm of the proposed method is given in Fig.[T] 
As seen in the algorithm, the final values of the previous esti- 
mation are used for initialization of the next steepest descent. 
As explained in the previous section, the decreasing sequence 
of (T is used to escape from getting trapped into local minima. 

In the minimization part, the steepest descent with vari- 
able step-size (p) has been used: If /i is such that iJ(a — 
/^fs'' ^ ~ ^ H{a., 6) we multiply the value of /i by 

1.2 for the next iteration. Otherwise if ji is such that H{a — 
/^M' ^ ~ ^ - ^(°^' ^) multiply the value of ^ by 
0.5 for the next iteration. 

4. EXPERIMENTAL RESULTS 

In order to experimetally evaluate our method, we generated 
a random dictionary 4> which had 80 atoms and each atom 
was a signal of length 40 (thus we assumed m = 80 and 
n = 40 in our simulations). Then we created a synthetic 
vector s by generating a sparse coefficient vector a. at random, 
using a Bernoulli-Gaussian model: each coefficient is 'active' 
with probability p, and is 'inactive' with probability 1 — p. 
If it is active, its value is modeled by a zero-mean Gaussian 
random variable with variance (7^^^; if it is not active, its value 
is modeled by a zero-mean Gaussian random variable with 
variance cr^g, where cr^fj ^ cTq,-,. Consequently, each ai is 
distributed as: 

a,^p- A/'(0, don) + (1 - p) ■ AA(0, CToff ), (11) 



Fig. 1. The final algorithm 

where p denotes the probability of activity of the coefficient, 
and sparsity implies that p <C 1. In our simulations we have 
fixed (Ton — 1, foff — 0.01, p = 0.1, and A = 0.75. 

Then we created the signal s by s = $ a + n, where 
n is an additive white Gaussian noise with zero mean and 
standard deviation — 0.01. Finally, we shifted the s vector 
circularly by k samples where k was a random number from 
to 39. We applied our algorithm to convolutively decompose 
this vector s over the dictionary $ . 

The simulation was repeated 1000 times with randomly 
generated coefficients, dictionary and the shift of the signal, 
and it was seen that in 992 experiment the algorithm could 
sucessfully estimate the shift value and the coefficient vector 
OL. In average, the Signal to Noise RaticQ (SNR) was greater 
than 24dB. Figure |2] shows one of the runs of these experi- 
ments. In the other 8 experiments the algorithm felled into 
local minima, and could not correctly estimated a. and 0. 

' Signal to Noise Ratio is defined as 10 logi n ,, 1^*^^^ where a is the 
a Oil) ||ct-a:||^ 

estimated coefficient vector. 
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Fig. 2. A sample of our experiments. From top to bottom, 
first plot represents a randomly generated coefficient vector 
a, second plot is the synthetic vector s which has the coeffi- 
cient vector a. on the randomly generated dictionary # , third 
plot is a randomly shifted version of vector s which is the in- 
put of our algorithm, fourth plot is the estimated coefficient 
vector a, and the last plot is the vector s which has the coef- 
ficient vector 6t. 

In order to see the effect of A on the estimation quality, the 
algorithm was repeated for A's between 0.3 and 0.9 (outside 
this interval SNR decreases rapidly). For each value of A we 
repeated the algorithm 100 times and the mean SNR for each 
A is computed. The mean SNR is plotted versus A in Fig. [3] 



5. CONCLUSION 

In this paper, a new method was proposed as the first step for 
solving the convolutive sparse decomposition problem. The 
proposed method can be used in the cases in which we know 
that one of the shifted versions of a signal s has a sparse repre- 
sentation on an overcomplete dictionary, and we are looking 
for the sparsest representation among the representations of 
all the shifted versions of s. We used Discrete Fourier Trans- 
form (DFT) to convert the problem to a continuous optimiza- 
tion problem. The proposed method was fast because of using 
the idea of smoothed ^''-norm ||7]- Experimental results em- 
phasized on the performance of the proposed algorithm. 

It seems that the proposed algorithm can be generalized 
for applying to the general convolutive sparse representation 
problem (in which the shift values ki are not necessarily equal). 
However, our simulations show that the main difficulty of 
such a generalization is that the algorithm very oftenly traps 
into local minima. Such a generalization is currently under 
study in our group. 



Fig. 3. Output SNR versus A. 
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